Automatic Continuous Speech Recognition with Rapid Speaker Adaptation for Human/machine Interaction

نویسنده

  • Nikko Ström
چکیده

This thesis presents work in three main directions of the automatic speech recognition field. The work within two of these – dynamic decoding and hybrid HMM/ANN speech recognition – has resulted in a real-time speech recognition system, currently in use in the human/machine dialogue demonstration system WAXHOLM, developed at the department. The third direction is fast unsupervised speaker adaptation, where “fast” refers to adaptation with a small amount of adaptation speech. The work in dynamic decoding has involved the development of a continuous speech decoding engine based on the A* search paradigm. An efficient implementation of the algorithms has made real-time continuous speech recognition possible in the WAXHOLM dialogue system with a lexicon of about 1000 words. Features of the search algorithms that are important for the real-time performance are proposed. These include efficient use of beam-pruning, and graph reduction methods that greatly reduce the effective search space. The hybrid HMM/ANN recognition is an area of work in its own right, but is also important in the speaker adaptation experiments. A very flexible ANN architecture has been developed and refined during the course of the thesis work. The architecture is a generalization of the TDNN and the RNN architecture, and allows both delayed and look-ahead connections. In the latest experiments, sparsely connected networks were investigated. Sparsely connected networks were shown to perform significantly better than their fully connected counterparts with an equal number of connections. In an experiment with phoneme recognition of the TIMIT database, the recognition rate of the hybrid HMM/ANN system is in the range of the highest reported, and only outperformed by another hybrid system. The fast speaker adaptation work is based on the notion that an explicit a priori model of the speaker variability helps to rapidly adapt to a new speaker. In the experiments, a parametric speaker characterization is introduced in the ANN by adding special-purpose speaker-space input units whose activity values are determined by the speaker adaptation. Experiments have been made both with the American English TIMIT database and the Swedish WAXHOLM database, and a positive adaptation effect is detected after only a few syllables.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...

متن کامل

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...

متن کامل

A Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation

Abstract   Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...

متن کامل

ACOUSTIC MODEL ADAPTATION FOR AUTOMATIC SPEECH RECOGNITION AND ANIMAL VOCALIZATION CLASSIFICATION by

ACOUSTIC MODEL ADAPTATION FOR AUTOMATIC SPEECH RECOGNITION AND ANIMAL VOCALIZATION CLASSIFICATION Jidong Tao, B.Eng., M.S. Marquette University, 2009 Automatic speech recognition (ASR) converts human speech to readable text. Acoustic model adaptation, also called speaker adaptation, is one of the most promising techniques in ASR for improving recognition accuracy. Adaptation works by tuning a g...

متن کامل

Hindi Speech Recognition and Online Speaker Adaptation

Speaker Adaptation is a technique which is used to improve the recognition accuracy of Automatic Speech Recognition (ASR) systems. Here, we report a study of the impact of online speaker adaptation on the performance of a speaker independent, continuous speech recognition system for Hindi language. The speaker adaptation is performed using the Maximum Likelihood Linear Regression (MLLR) transfo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997